Fault Tolerance Through Automated Diversity in the Management of Distributed Systems
نویسنده
چکیده
Nowadays the reliability of software is often the main goal in the software development process. Despite more and more improvements in fault preventing techniques, it is a fact that faults remain in every complex software system. In contrast to hardware-faults, no concepts or mechanisms for fault tolerance of general software-faults became widely accepted. In this paper we present a new concept for the design of system management, that enables the tolerance of software-faults of the executed applications. We explain how the underlying resource management of a distributed system affects the occurrence of errors of a faulty application and how the re-execution of the application with altered resource management decisions prevents the occurrence of errors with certain probability. Our contribution is to motivate a new approach for fault tolerance in distributed systems and to give a general concept for system designers which states what decision spaces can be used to tolerate faults and how different alternatives in one decision space can be evaluated and generated. The concept is applicable for a large class of system mod-
منابع مشابه
A Genetic Based Resource Management Algorithm Considering Energy Efficiency in Cloud Computing Systems
Cloud computing is a result of the continuing progress made in the areas of hardware, technologies related to the Internet, distributed computing and automated management. The Increasing demand has led to an increase in services resulting in the establishment of large-scale computing and data centers, in addition to high operating costs and huge amounts of electrical power consumption. Insuffic...
متن کاملImproving the palbimm scheduling algorithm for fault tolerance in cloud computing
Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...
متن کاملInfluence of Fault Current Limiter in Voltage Drop and TRV Considering Wind Farm
Influence of distributed generation systems in the distribution systems can increase the level of short-circuit current. The effectiveness of distributed generation systems is affected by the size, location, type of distributed generation systems technology, and the methods of connecting to distribution systems. Wind turbine system is the examples of distributed generation source. Not only does...
متن کاملSoftware Diversity and Fault-Tolerance: An Overview
The design of reliable and fault-free software is of a major concern for safety-critical real-time and distributed applications. The fault tolerant community addresses these problems through redundancy in hardware components and by diversity, using different software components. Diversity has been used for many years now as a computer defence mechanism to achieve an acceptable degree of fault-t...
متن کاملAutomated Stream-Based Analysis of Fault-Tolerance
A rigorous, automated approach to analyzing fault-tolerance of distributed systems is presented. The method is based on a stream model of computation that incorporates approximation mechanisms. One application is described: a protocol for fault-tolerant moving agents.
متن کامل